2. CodeAct Makes LLMs Better Agents
2.1. What is CodeAct?
Figure 2: multi turn
https://github.com/xingyaoww/code-act/blob/main/figures/overview.png?raw=true
エージェントがsympyのコードを書いている
E. Example Prompt for CodeAct
Towards Unified Alignment Between Agents, Humans, and Environment
2.2. CodeAct Shows the Promise as a Strong Tool Use Framework
テキスト、JSON、CodeActどのフォーマットが正解のatomicなツール呼び出しをもたらすか実験
Table A.6にフォーマットの例
筆者らの仮説として、訓練で大量のコードを見ているのでCodeActはLLMにとって自然なのでは
API-Bank: A Comprehensive Benchmark for Tool-Augmented LLMs
結果はTable 2
For most LLMs, CodeAct achieves comparable or better performance even in atomic actions (the simplistic tool use scenario)
Table 2を見ると最善か次善
OpenなLLMにもClosedなLLMにもworkという主張(Best-performingの合計)
Open-source LLMでCodeActによる性能向上(JSONは最下位)
Closed-source LLMはJSONが効いている
2.3. CodeAct Gets More Done with Fewer Interactions
この論文で用意した M3ToolEval (Table A.7)
multiple calls to multiple tools in multi-turn interactions
F. M3ToolEval Prompt
実装? https://github.com/xingyaoww/code-act/blob/d607f56c9cfe9e8632ebaf65dcaf2b4b7fe1c6f8/scripts/eval/m3tooleval/main.py
Figure 1: Comparison between CodeAct and Text / JSON as action
https://github.com/xingyaoww/code-act/blob/main/figures/json-text-comparison.png?raw=true
Instruction
Determine the most cost-effective country to purchase the smartphone model "CodeAct 1". The countries to consider are the USA, Japan, Germany, and India.
Available APIs 5つ
lookup_rates(country: str) -> (float, float)
convert_and_tax(price: float, exchange_rate: float, tax_rate: float) -> float
estimate_final_price(converted_price: float, shipping_cost: float) -> float
lookup_phone_price(model: str, country: str) -> float
estimate_shipping_cost(destination_country: str) -> float
エージェントがアクションとしてコードを書く
for文で反復
Python組み込みのmin()